Aller au contenu principal

ETL - ELT

 

ETL vs ELT

ETL - Extract, Transform, Load

ETL is a process that extracts data from a source, transforms it into a format that is suitable for analysis, and then loads it into a storage.

ELT - Extract, Load, Transform

ELT is a process that extracts data from a source, loads it into a storage, and then transforms it into a format that is suitable for analysis.

 


Processing Steps

Extract

The first step in the ETL process is to extract data from a source. This source could be a database, a file, or an API. The data is extracted in its raw form, and is not yet ready for analysis.

Here is different ways to extract data:

  • Progressive Extraction: Extracting data incrementally, for example, only extracting data that has been added or updated since the last extraction.

  • Full Extraction: Extracting all data from the source, regardless of whether it has been added or updated since the last extraction.

  • Notification Updates: Using notifications to trigger an extraction when data has been added or updated.

Transform

The next step in the ETL process is to transform the data into a format that is suitable for analysis, data science, or machine learning. This could involve cleaning the data, removing duplicates, and converting data types.

Here are some common transformations:

  • Cleaning: Removing or correcting errors in the data.
  • Deduplication: Removing duplicate records from the data.
  • Normalization: Converting data into a standard format.
  • Aggregation: Combining data from multiple sources into a single dataset.
  • Derivation: Creating new columns or features from existing data.
  • Filtering: Removing unwanted data from the dataset.
  • Joining: Combining data from multiple tables or datasets.
  • ...

Load

The final step in the ETL process is to load the transformed data into a storage. This storage could be a database, a data warehouse, or a data lake. The data is now ready for analysis, data science, or machine learning.

Here are some common ways to load data:

  • Full Load: Loading all the transformed data into the storage.
  • Incremental Load: Loading only the data that has been added or updated since the last load.
  • Icremental Load Streaming: Loading data in real-time as it is extracted and transformed.

 


ETL vs ELT

ETL and ELT are two different approaches to data processing. The main difference between the two is the order of steps in the process.

Common Points

  • Objective: ETL and ELT have the same objective, who is to enable companies to exploit their data.
  • Automation: Both ETL and ELT can be automated using tools and platforms.
  • Data Governance: Both ETL and ELT require data governance to ensure data quality and compliance.

Impacts of Differences

AspectETLELT
DisponibilityNeed to know in advance what you plan to do with your dataStore the data and decide later how to use it
FlexibilityLess flexible because data is transformed before being loadedMore flexible as data is loaded, u can choose how to use it later
AccessibilityData is transformed before loading, which can make access harderData is stored in raw form, making it more accessible in some cases
EvolutionLess evolutive as transformations are predefinedMore evolutive as you can store data and decide later how to use it
StorageRequires less storage as only transformed data is storedRequires more storage as raw data is stored

Which to Choose ?

  • Syncronization: If you need to synchronize data between different systems, ETL is the best choice.
  • Modernization: If you want to modernize your data architecture, ELT is the best choice for the migration.
  • Data Volume: If you have a large volume of data to treat, ELT is the best choice because you can store the data and decide when to transform it.
  • Vitesse Access: If you need to access the data quickly, ELT can be useful because this may involve less unnecessary delay in accessing data.